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UNIFIED DIGITAL ARCHITECTURE 

RELATED APPLICATIONS 

This application claims the benefits of provisional patent application Serial No. 

60/262,401, filed January 16, 2001, for "Unilink Digital Architecture" (Docket No. 
RAL920010003US1). 

This application is related to the following copending applications, all of which 

are incorporated herein by reference: Serial No. , filed , 

for "Architecture For Advanced Serial Link Between Two Cards" (Docket No. 

RAL920010004US2); Serial No. , filed , for 

"Analog Unidirectional Serial Link Architecture" (Docket No. RAL920010005US2); and 

Serial No. , filed , for "Apparatus And Method For 

Oversampling With Evenly Spaced Samples" (Docket No. RAL920010011US2). 

FIELD OF THE INVENTION 

The present invention relates to the logic architecture that provides the digital 
functions required to enable a unified, unidirectional serial link across wired media, such 
as a chip-to-chip or a card-to-card interconnect. 

BACKGROUND OF THE INVENTION 

Serial data must be transmitted across wired media. The transmit and receive 
sections include chips wired to one another and card-to-card interconnects. The 
transmission media can be a combination of printed circuit board, connectors, backplane 
wiring, fiber or cable. The interconnect can include its own power, data and clocking 
sources or may derive these functions from a host module. Such data has typically been 
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transmitted through a parallel data bus, such as ISA, PCI, PCI-X and the like. One 
drawback of such parallel links is the moderate rate of data transmission due to improved 
microprocessor performance, resulting in data transfer bandwidths that typically outpace 
I/O transfer rates. Also, the ASIC I/O count is high. In addition, the system integration 
I/O count using a parallel data bus is high. Finally, the overall system cost associated 
with the use of the parallel data bus tends to be high. 

Related art shows attempts to overcome these difficulties and drawbacks by 
utilizing serial communication systems involving a variety of schemes. For example, 
some have used a carrierless amplitude/phase (CAP) modulation scheme. Others have 
used linear compression/decompression and digital signal processing techniques for 
frequency modulation. Still others use a linear (analog) phase rotator to recover only the 
carrier of an incoming signal. Some transmit using a pass band which limits the 
bandwidth of the frequencies being passed, rather than a baseband channel wherein the 
signals are not shared and the frequencies are not restricted. 

BRIEF DESCRIPTION OF THE INVENTION 
A unified digital architecture comprises a logic transmitter portion and logic 
receiver portion. The logic transmitter portion contains a phase locked loop (PLL), a 
dibit data register, a finite impulse response (FIR) filter and a transmit data register. It 
also uses a pseudo-random bit stream (PRBS) generator and checker. The digital receiver 
portion contains a PLL, a FIR phase rotator, a phase rotator control state machine and a 
clock buffer. It can also include a pseudo-random bit stream (PRBS) generator and a 
checker for diagnostics. 
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The invention comprises a unified serial link system and method for transmitting 
digital data across wired media including a transmitter and a receiver. The transmitter 
PLL comprises a dual loop phase locked loop control circuit having a digital coarse loop 
and an analog fine loop. The transmitter PLL contains a four-stage voltage controlled 
oscillator adapted to run at full bit frequency. The coarse loop includes a reference 
generator, a voltage comparator, a PLL control logic, a digital to analog counter and a 
low pass filter. The fine loop includes the oscillator, a 4X-frequency divider, a phase- 
frequency detector, a charge pump and a loop filter. The transmitter contains a random 
bit stream generator. The receiver includes a dual loop PLL phase locked loop control 
circuit and an over sampled half-rate system with a multi-step phase rotation, controlled 
by a signal generated by an early/late comparator. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is an illustration of the control circuit for a phased lock loop; 

Figure 2 is a block diagram of the transmitter architecture of the present 
invention; 

Figure 3 is a block diagram of the receiver architecture of the present invention; 
Figure 4 shows an averaging pattern for a phase rotator control; and 
Figure 5 shows another embodiment of an averaging pattern for a phase rotator 
control. 

DETAILED DESCRIPTION OF THE INVENTION 

The PLL control architecture is intended to provide the coarse PLL control loop 
for a dual-loop PLL. Lock is determined by comparing two Grey counters running on the 
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reference and PLL clocks. The digital to analog conversion (DAC) bits, which set the 
coarse control voltage for the PLL, are controlled by monitoring a signal from the PLL 
(V_Fine_H) which indicates which half of its operating range it is in. Figure 1 illustrates 
the PLL control circuit. 

More specifically, Figure 1 shows a full data rate PLL 110. This PLL is the clock 
source for the transmitted data and runs at the full data rate of e.g. 2 to 3 Gbps. A stable 
frequency from a reference clock 112 is required for determining if the PLL is locked to 
its correct frequency. The clock 112 operates at one-fourth of the full data rate. For 
example, a 625 Mhz clock rate is used for an operational data rate of 2.5 Gbps. A single 
clock phase is buffered, is brought out of the PLL, and is used to drive into a phase buffer 
circuit. 

The PLL contains a four-stage voltage controlled ring oscillator (VCO), a 4X 
frequency divider, phase-frequency detector, charge pump and loop filter. These 
elements form the "fine" control loop. The VCO has both a 'fine' analog and a 'course' 
digital control voltage in order to minimize the required gain of the fine loop. The VCO 
is capable of changing the speed of oscillation by adjusting the local feedback within a 
delay cell, as well as controlling feedback within the VCO for speed enhancement. In 
addition to the fine control loop elements, the PLL 110 contains a reference generator, a 
voltage comparator, PLL control logic, a low-pass filter and a digital to analog counter 
132. These elements form the coarse control loop. 

The fine control loop is a conventional analog loop and is intended to provide a 
stable, low-noise, low-jitter clock source for the transmitter. The details of the fine 
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control loop are well known in the art and do not by themselves comprise any part of the 
present invention. 

The coarse control loop is a digital representation of a conventional analog control 
loop based on a 'leaky' loop filter capacitor. This type of loop relies on leakage from the 
loop filter cap to drive the control voltage in a particular direction regardless of the 
frequency of the VCO. A phase detector and charge pump that only increases the charge 
on the filter cap compensates this leakage. The loop is stable when the charge being 
added to the cap balances the charge that is leaking. 

A signal from the reference source 112 is fed into a reference counter 118. A pre 
scaler 114 divides the frequency to one-fourth. A frequency comparator 120 matches the 
frequency from the PLL counter 116 with that of the reference counter 118 to determine 
if the PLL 110 and the reference clock 112 are running at the same frequency and are 
counting at the same rate. The two counters 116, 118 are compared over a period of time, 
e.g. 10-bit count, as determined by the frequency timer 122. Over this 10-bit count, if the 
comparator determines that the counted values are maintaining a fixed distance from one 
another, the comparator 120 then confirms that the PLL 110 is locked. The PLL lock 124 
monitors the output of the frequency timer. Every time the frequency timer 122 reaches 
its maximum count, the PLL counter 116 and the reference counter 118 are reset. Thus, 
this comparison is performed each time the frequency timer 122 times out. If, during the 
interval, the two counters 116, 118 have not compared to one another or the frequency 
comparator 120 has not become true, this assumes that the clocks are locked because the 
counters are not catching up with one another. If, however, the frequency timer 122 
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times out and a frequency comparator 120 has compared the reference count and the PLL 
count, it declares that the PLL is unlocked. These two counters are reset if the frequency 
timer 122 declares that the PLL is unlocked. 

A V_fine_H signal 126 is introduced from the analog section of the transmitter 
and indicates that the fine loop is at the center of its range. When the PLL locks and the 
fine loop is centered, the signal can move up and down from the center with some degree 
of latitude. This then allows for perturbations of the system, such as temperature 
changes. The signals from the PLL and the fine loop are asynchronous and go to the 
sample latch 128. If the V_fine_H signal is not on and the PLL lock signal is not on, then 
an 'up' signal is applied. This causes the decision counter 134 to count up, thereby 
causing the DAC counter 132 to also count up. When both of these conditions are met, 
the system stops counting up. 

The DAC counter is a binary search counter with 64 possible steps counting up 
from 000000 to 111111. The counter steps through all of the different settings until it 
finds a setting where the PLL will lock. 

The PLL control logic in the coarse control loop has an up/down counter 130, the 
value of which represents the charge on the loop filter cap. This counter 130 is slowly 
decremented to represent leakage. The voltage comparator is high or low depending on 
whether the fine control voltage is operating in the upper or the lower half of its range. 
To balance the leakage, the control logic samples the comparator output. After multiple 
samples showing upper range operation, the up/down counter is incremented to represent 
adding charge to the loop filter cap. The DAC and low-pass filter convert the up/down 
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counter output to a control voltage. The coarse control loop is intended to compensate 
for manufacturing process variations and relatively low frequency but large changes due 
to power supply and temperature drift. 

The transmit phase buffer consists of circuits which are designed to interface to 
the pre-drive section of the PLL and provide only light loading to the PLL. The phase 
buffer then drives out to a latch providing the clock necessary for a full rate design. The 
phase buffer must also provide adequate rise and fall times, taking into account the 
estimated net loadings. 

The driver/equalizer consists of current -mode differential drive circuits which are 
controlled by a finite impulse response (FIR) type filter function. This filter is 
implemented by the combination of a shift register containing the current outgoing data 
bit and a history of three previous bits. This shift register, in turn, controls the activation 
of weighted current drivers. The output transfer function is of the general form of H(Z) = 
Ab 0 + Abiz" 1 + Ab 2 Z" 2 + Ab 3 Z" 3 wherein the values of the b n coefficients are negative. 
The numerical values of the coefficients are set by register values in the logic. The 
determining factors for the values of these coefficients include the characteristics of the 
transmission, media, the speed of transmission, the type of board connector used, the type 
of chip package, etc. The data bits are fed to the transmitter after the necessary 
conversion to the differential signal form and the powering up that is required to control 
the driver. 

The transmitter architecture is a multiplexing full-rate system. It is supported by 
three major analog blocks: a full data rate PLL, a phase buffer to repower the PLL signal 



RAL920010003US2 (IRA-10-5608) 



7 



for the driver, and an off-chip driver with a built-in pre-emphasis equalization. In 
addition, there are specialized circuits for testing of the PLL. Figure 2 shows a block 
diagram of the transmitter architecture. A PLL 210 is controlling a four-stage ring 
oscillator 240 running at the full bit frequency. This PLL is shared by four transmitters. 
5 The phase outputs are used as local recovered clocks and to clock the FIR section of the 

driver. Word data (eight or ten bits) is clocked into a register synchronously with a word 
clock 242 generated from the PLL clock. The word data is transferred two bits at a time 
to a dibit data register 230 which is then loaded one bit at a time into the transmit data 
register. The final output is transferred at the full bit rate to the driver/equalizer block 

10 226. The transmitter also contains a pseudo-random bit stream (PRBS) generator and 

checker 232, which allows for self -testing in a wrap mode as well as link testing with a 
corresponding receiver. 

In more detail, the transmitter structure takes eight or ten bit data from the dibit 
data register 230 and synchronously transfers the data two bits at a time through a bit 

15 selector 212 to a first holding latch 214 and a second holding latch 216. The bit selector 

212 processes the two bit data pair by the least significant bit first, followed by the next 
lower significant bit. The counter 222 tracks the number of pairs that are processed and 
the order of each bit in the pair. When the counter senses that all bit-pairs have been 
serialized, the interfacing logic is notified to send another word for processing and the 

20 eight bit/ten bit register is clocked to latch the new data to be processed. The contents of 

the first and second latches 214, 216 are transferred in an alternating fashion under the 
control of dibit clock 224 to a dibit data register 218 and then to a single latch 220. A bit 
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stream from this single latch 220 is transmitted to a driver/equalizer 226. This device 
takes the bit stream and creates a current-mode differential signal that is frequency 
equalized for the assumed media channel. The equalization is a finite impulse response 
(FIR) pre-emphasis type using reduced current levels for longer run lengths. 

The driver equalizer consists of current-mode differential drive circuits which are 
controlled by the FIR filter function commonly employed for this purpose. The filter is 
implemented by the combination of a shift register containing the current outgoing data 
bit and a history of the three most recent bits of outgoing data. The shift register, in turn, 
controls the activation of weighted current drivers. 

The receiver architecture is a three fold, oversampled, half -rate system with a 54- 
step phase rotator, advanced digital bang-bang control circuit and an implementation of a 
sample processing algorithm centering the static edge in the middle between two 
samples. The receiver takes an NRZ encoded baseband signal from the transmitter and 
aligns the edges to determine where the signal switches between 'l's and '0's. As with 
other signals, the problem is placing the center point between the edges. This is a 
achieved by sampling the signal and generating early or late signals based on whether the 
signal is being sampled too early or too late. When the frequency of the 'up' pulse is 
more than that of the 'down' pulse, the system drifts in the 'up' direction. Conversely, it 
drifts in the 'down' direction when the frequency of the down pulse is greater than that of 
the 'up' pulse. 

The phase adjustment and clock recovery is done by a phase rotating device, and 
not by using a PLL or delay locked loop (DLL). If there is more than one analog PLL or 
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DLL on one chip, these circuits tend to interact via supply and/or substrate coupling. It 
would, therefore, be difficult to control their phases/delay in an analog fashion. The use 
of an independent PLL clock generator and an external phase rotator makes the system 
more immune against injected noise. The control of the phase shift is digital. The system 
5 operates at half the bit rate. For analysis of edges and data, however, three bit intervals 

are used in order to have a half bit overlap on both sides. Some of the actual samples are 
reused in the following analysis cycle. The rotator control state machine has a fly wheel' 

.>' function. It monitors the phase update rate and applies an update even if no edge 

information was measured. This allows a TX to RX frequency offset even with very long 

!f*' 10 (>1024 bit) run length if the jitter of the two clocks is small enough. 

The receiver analyzes the oversampled data stream and generates two sets of 

{!># 

jy. correlation output signals: the detected bit values and the early and late signals for an 

it! eventual update of the phase rotator. The early /late signals are generated by use of edge 

^ and data correlation tables of the type shown and described in copending patent 

15 application (Docket No. RAL920010011US2). These tables provide a pattern 

recognition algorithm that serves to specify the early and late signals as a function of the 
input sample pattern. When the detected bit edge is centered between two samples, there 
is a dead-zone' in the clock and data recovery (CDR) control loop if no jitter is present. 
With a jitter number larger than the sample spacing, the loop will average the detected 
20 sample crossings and will position the edge in the middle between two samples. This is a 

different situation than that found in a PLL phase detector with a dead zone, because the 
jitter is much larger and the phase control is digital with no leakage effects. The 
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probability of generating a metastable sampling output is reduced for a middle edge 
position because the probability of an edge being positioned right on a sample is reduced. 

The phase rotator control state machine processes the early and late outputs from 
the edge correlation outputs to control the phase settings of the phase rotator. 
5 Figure 3 illustrates a block diagram of the receiver architecture. A PLL 310 

receives a signal from a reference clock 308. The PLL controls a voltage controlled three- 
stage ring oscillator (VCO) running at half the bit frequency. The PLL is shared with 
h| four receivers, one, 316 being shown. The six phases from the VCO are fed into a phase 

*!§ rotator 312 having 54 steps for a 2p interval. The 54 steps are generated with a phase 

{J 10 rotator having six phases with three inter-slice phase steps further divided by three. 

j* r The six outputs of the rotator 312 are buffered, and the edges are shaped to be 

H able to sample a signal having twice the frequency. One of the phase outputs is used as 

f!| local recovered clock 314. A clock buffer makes sure that it is not over-loading the phase 

rotator. Timing analysis determines which phase is the optimum to use. The output 
15 section of the phase rotator suppresses common mode signals and performs a limiting 

signal. 

The output is then driven out (with the signals from the phase rotator) to the phase 
buffers and to a sample latch complex 318 which in turn provides clocks. Six samples 
are taken over a two-bit interval. The sample latch complex is a CMOS, positive edge 
20 triggered latch. It takes differential data inputs and, with a single ended clock, outputs a 

single ended logic level signal. The complex consists of two circuits, the latch itself and 
a buffer that sharpens the output to the receive logic. The retiming latches 320 add three 
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pipeline stages from the pseudo random bit stream (PRBS) register 360 in order to reduce 
the probability of a metastable state to a value much lower than the targeted bit error rate. 
It is also helping to align the data to one single clock phase. In order to be able to process 
information from more than one bit interval for the recovery of one data bit, a memory 
stage 322 reuses four samples from the previous sampling period. A total of ten samples 
is, therefore, fed into the half rate edge and data detection correlation blocks 324, 326, 
328, 330 that make use of a pattern recognition algorithm. Truth tables represent the 
initial best guess for the data. 

The outputs of the edge and data detection block are the recovered two bit and the 
early and late signals going to the phase rotator control state machine 340. This involves 
the use of a bang-bang control circuit with adaptive step size. The state machine can be 
viewed as a digital filter that evaluates the early and late signals and commands an 
adjustment of the sample point. The rotator counter and temperature code generator 334 
generates the 54 control signals for the phase rotator, and this closes the CDR loop. 

The data path consists of a shift register 350 which loads two bits from the data 
correlation blocks during each half-rate cycle. The shift register is loaded to a word data 
register 352 (eight or ten bits) using a word clock derived from the PLL clock. A rate 
counter 354 monitors the shift register 350 and the eight/ten bit register 352. 

The receiver also contains a pseudo-random bit stream (PRBS) generator and 
checker which allows for self -testing in a wrap mode as well as link testing with a 
corresponding transmitter. The checker comprises a linear feedback shift register (LFSR) 
372a which generates a random pattern. This pattern is sent through the receiver. This 
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same pattern is regenerated in a second LFSR 372b. With the XOR, the regenerated 
pattern is compared with the data passing through the receiver. Both of these registers 
372a, 372b are part of the PRBS function 

The method for the phase rotator control is an advanced bang-bang state machine. 
As seen in Figure 4, it involves eight-fold initial early/late averaging. It has sixteen states 
and may be implemented using four latches. The state machine 340 (Fig. 3) has two 
inputs, one for early and one for late. The averaging effect is achieved in the following 
manner. The state machine is set to eight. If several early signals in a row, but not 
enough to drive the state to 'V, are followed by several late signals, the state machine 
averages them out. However, when a preponderance of early or late signals takes the 
state machine to '1' or '14', the state machine determines that the sampling is occurring 
too early or too late and determines whether to change the sample point. The state 
machine produces a 'down' signal when it gets to a state T, and an 'up' signal when it 
gets to a state '14'. This output signal from the state machine, if it is a 'down', instructs 
the rotation counter to adjust the sampling to a later point. Conversely, an 'up' signal 
will instruct the counter to adjust the sampling to an earlier point. 

The bang-bang control state machine is followed by an up and down counter with 
54 steps (requiring six flip-flops) for the receiver with sample processing. The counter 
has 54 steps and controls where the sample point will be. The counter processes two bits 
at a time in parallel. Thus, there are twenty-seven positions where the sample point can 
be set for each bit. That defines the limits of the resolution. As noted, the state machine 
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determines whether to change the sample point and the counter determines where the new 
sample point will be. 

Figure 5 shows the operation of a second embodiment of the state machine. This 
machine 500 combines early/late averaging with adaptive behavior that changes the 
amount of averaging based on the number of consecutive early or late inputs. When a 
sustained sequence of early or late inputs is received, this state machine reduces the 
amount of averaging in order to increase the stepping rate for the phase rotator. This 
state machine contains 64 states and requires 6 latches. As with the prior embodiment, 
the state machine is followed by an up and down counter with 54 steps requiring six flip- 
flops. 

While the invention has been described in combination with embodiments 
thereof, it is evident that many alternatives, modifications, and variations will be apparent 
to those skilled in the art in light of the foregoing teachings. Accordingly, the invention 
is intended to embrace all such alternatives, modifications and variations as fall within 
the spirit and scope of the appended claims. 
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